User-Centered Analysis of Corpora Using Semantic Features Redundancy

نویسندگان

Thibault Roy

Pierre Beust

Stéphane Ferrari

چکیده

Accessing textual information is still a complex activity when users have to browse through large corpora or long texts. In order to help users in such tasks, we propose a model dedicated to lexical representation of thematic domains as well as tools for personal corpora analysis. The lexical model is a differential one, inspired by Saussure's semiotics. It consists in structuring and describing lexical units by the way of semantic features which are differences between terms meanings. Each thematic domain is represented by a set of terms characterized by many semantic features. These are built by the user through an interactive tool developed by our team. Generally, domains include between 60 and 100 terms. Lexical resources are identified in the corpus with the ProxiDocs tool. It returns interactive maps and reports built from the distribution of domains terms in the corpus. Maps reveal proximities and links between texts or sets of texts. The most often repeated semantic features in texts and in sets of texts are pointed out on the maps. According to the Interpretative Semantics, we call such a redundancy “intertextual isotopies”. These intertextual isotopies can represent redundancies of global domains which reveal topics of the considered texts, or can indicate a local semantic property, such as an expression of violence for instance, shared by some texts of the corpus. In this paper, first we present the lexical model as well as the related tools for building personal lexical resources and interactively visualising them in a corpus. The second section deals with notions linked to the semantic features and particularly with the intertextual isotopies. We also propose in this section methods to detect them in corpus. Section 3 presents two experiments in order to illustrate how such a redundancy can be useful for two kinds of tasks: information retrieval in a Web pages corpus and semantic analysis of conceptual metaphors in a domain-specific corpus of newspapers. Finally, we conclude on the importance to take into account the intertextual isotopies, and more generaly the global context established by the corpus, in tasks of access to information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)

Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis. Methods: The method of this research is log anal...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

UCD-S1: A hybrid model for detecting semantic relations between noun pairs in text

We describe a supervised learning approach to categorizing inter-noun relations, based on Support Vector Machines, that builds a different classifier for each of seven semantic relations. Each model uses the same learning strategy, while a simple voting procedure based on five trained discriminators with various blends of features determines the final categorization. The features that character...

متن کامل

Fuzzy Approach Topic Discovery in Health and Medical Corpora

The majority of medical documents and electronic health records (EHRs) are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of t...

متن کامل

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

User-Centered Analysis of Corpora Using Semantic Features Redundancy

نویسندگان

چکیده

منابع مشابه

Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)

Query expansion based on relevance feedback and latent semantic analysis

UCD-S1: A hybrid model for detecting semantic relations between noun pairs in text

Fuzzy Approach Topic Discovery in Health and Medical Corpora

Semiautomatic Image Retrieval Using the High Level Semantic Labels

عنوان ژورنال:

اشتراک گذاری